Optimizing Sparse Matrix - Vector Product Computations Using Unroll and Jam

نویسندگان

John M. Mellor-Crummey

John Garvin

چکیده

Large-scale scientific applications frequently compute sparse matrix vector products in their computational core. For this reason, techniques for computing sparse matrix vector products efficiently on modern architectures are important. This paper describes a strategy for improving the performance of sparse matrix vector product computations using a loop transformation known as unroll-and-jam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an ASCI application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We show that our strategy is effective for improving sparse matrix vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3 and Itanium processors. Our measurements show that for this class of sparse matrices, our strategy improves sparse matrix vector product performance from a low of 11% on MIPS to well over a factor of two on Itanium.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Register Pressure Guided Unroll-and-Jam

Unroll-and-jam is an effective loop optimization that not only improves cache locality and instruction level parallelism (ILP) but also benefits other loop optimizations such as scalar replacement. However, unroll-and-jam increases register pressure, potentially resulting in performance degradation when the increase in register pressure causes register spilling. In this paper, we present a low ...

متن کامل

Source-to-Source Transformations for Efficient SIMD Code Generation

In the last years, there has been much effort in commercial compilers to generate efficient SIMD instructions-based code sequences from conventional sequential programs. However, the small numbers of compilers that can automatically use these instructions achieve in most cases unsatisfactory results. Therefore, the code often has to be written manually in assembly language or using compiler bui...

متن کامل

Improving Software Pipelining with Unroll-and-Jam

To take advantage of recent architectural improvements in micropr&essors, advanced compiler optimizations such as software pipelining have been developed [I, 2, 3, 41. Unfortunately, not all loops have enough parallelism in the innermost loop body to take advantage of all of the resources a machine provides. Unroll-and-jam is a transformation that can be used to increase the amount of paralleli...

متن کامل

Deep Jam: Conversion of Coarse-Grain Parallelism to Fine-Grain and Vector Parallelism

A number of computational applications lack instruction-level parallelism. This loss is particularly acute on sequences of dependent instructions on wide-issue or deeply pipelined architectures. We consider four real applications from computational biology, cryptanalysis, and data compression. These applications are characterized by long sequences of dependent instructions, irregular control-fl...

متن کامل

Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

Sparse matrix-vector multiplication is an important computational kernel that tends to perform poorly on modern processors, largely because of its high ratio of memory operations to arithmetic operations. Optimizing this algorithm is difficult, both because of the complexity of memory systems and because the performance is highly dependent on the nonzero structure of the matrix. The Sparsity sy...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

IJHPCA

دوره 18 شماره

صفحات -

تاریخ انتشار 2004

Optimizing Sparse Matrix - Vector Product Computations Using Unroll and Jam

نویسندگان

چکیده

منابع مشابه

Register Pressure Guided Unroll-and-Jam

Source-to-Source Transformations for Efficient SIMD Code Generation

Improving Software Pipelining with Unroll-and-Jam

Deep Jam: Conversion of Coarse-Grain Parallelism to Fine-Grain and Vector Parallelism

Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

عنوان ژورنال:

اشتراک گذاری